Tips of R Markdown Report
Introduction
When presenting the data overview and exploratory analysis results, we used to copy a lots tables, charts from Rstudio to PowerPoint, which makes the presentation preparation painful. It become essential for data scientist to make use of better reporting tools, such as R markdown, jupyter notebooks to author analysis presentation in a more efficient and organized way, of course, we also want this to be reproducible!
In this post, I would like to share some tips when I explore building analysis report using R markdown/notebook.
R markdown
Yihui Xie provided a very comprehensive and updated version of R markdown cookbook at https://bookdown.org/yihui/rmarkdown/, which gives good explanation how to make table of content, fold code snippets, configure the yaml header and more.
Tabs
One thing I found specially useful is the tabbed section. Tab layout helps to condense the parallel and lengthy content in the report.
Simply put {.tabset} tag after the markdown header and the sub-headers will become the tabs. The following code snippet gives an example
# Tabs {.tabset}
## Header2 - Tab1
this is tab1
## Header2 - Tab2
this is tab2
Header2 - Tab1
this is tab1
Header2 - Tab2
this is tab2
Tables
The native markdown table isn’t very user-friendly, so we have to make use of functions such as knitr::kable or DT::datatable to render the table from data.frame.
I would like to share some tips on choosing between kable and datatable.
kablehas simpler syntax and give more appealing “table like” tables in most themes.datatablehas more capability such as paged tables with download buttons. There are more configurations could be referred from its JavaScript API specifications.
In a nutshell, kable is preferable for smaller tables, while datatable is preferable for bigger tables.
markdown table
some random markdown table
| Tables | Are | Cool |
|----------|:-------------:|------:|
| col 1 is | left-aligned | $1600 |
| col 2 is | centered | $12 |
| col 3 is | right-aligned | $1 |
| Tables | Are | Cool |
|---|---|---|
| col 1 is | left-aligned | $1600 |
| col 2 is | centered | $12 |
| col 3 is | right-aligned | $1 |
kable
kable is from knitr package
require(knitr)
require(kableExtra)
mtcars %>%
head() %>%
kable(digits = 1, caption = 'example of kable table') %>%
kable_styling(full_width = FALSE, position = 'left') %>%
row_spec(0,
bold = T,
color = 'white',
background = 'black')| mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mazda RX4 | 21.0 | 6 | 160 | 110 | 3.9 | 2.6 | 16.5 | 0 | 1 | 4 | 4 |
| Mazda RX4 Wag | 21.0 | 6 | 160 | 110 | 3.9 | 2.9 | 17.0 | 0 | 1 | 4 | 4 |
| Datsun 710 | 22.8 | 4 | 108 | 93 | 3.9 | 2.3 | 18.6 | 1 | 1 | 4 | 1 |
| Hornet 4 Drive | 21.4 | 6 | 258 | 110 | 3.1 | 3.2 | 19.4 | 1 | 0 | 3 | 1 |
| Hornet Sportabout | 18.7 | 8 | 360 | 175 | 3.1 | 3.4 | 17.0 | 0 | 0 | 3 | 2 |
| Valiant | 18.1 | 6 | 225 | 105 | 2.8 | 3.5 | 20.2 | 1 | 0 | 3 | 1 |
datatable
datatable is from DT package
JS - DataTables
options list: https://datatables.net/reference/option/
R - DT package
Data Summary
summartools
require(summarytools)
mtcars %>%
dfSummary(style = 'grid',
graph.magnif = 0.75,
plain.ascii = F,
valid.col = FALSE,
tmp.img.dir = "/tmp") %>%
print()Data Frame Summary
mtcars
Dimensions: 32 x 11
Duplicates: 0
| No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Missing |
|---|---|---|---|---|---|
| 1 | mpg [numeric] |
Mean (sd) : 20.1 (6) min < med < max: 10.4 < 19.2 < 33.9 IQR (CV) : 7.4 (0.3) |
25 distinct values | 0 (0%) |
|
| 2 | cyl [numeric] |
Mean (sd) : 6.2 (1.8) min < med < max: 4 < 6 < 8 IQR (CV) : 4 (0.3) |
4 : 11 (34.4%) 6 : 7 (21.9%) 8 : 14 (43.8%) |
0 (0%) |
|
| 3 | disp [numeric] |
Mean (sd) : 230.7 (123.9) min < med < max: 71.1 < 196.3 < 472 IQR (CV) : 205.2 (0.5) |
27 distinct values | 0 (0%) |
|
| 4 | hp [numeric] |
Mean (sd) : 146.7 (68.6) min < med < max: 52 < 123 < 335 IQR (CV) : 83.5 (0.5) |
22 distinct values | 0 (0%) |
|
| 5 | drat [numeric] |
Mean (sd) : 3.6 (0.5) min < med < max: 2.8 < 3.7 < 4.9 IQR (CV) : 0.8 (0.1) |
22 distinct values | 0 (0%) |
|
| 6 | wt [numeric] |
Mean (sd) : 3.2 (1) min < med < max: 1.5 < 3.3 < 5.4 IQR (CV) : 1 (0.3) |
29 distinct values | 0 (0%) |
|
| 7 | qsec [numeric] |
Mean (sd) : 17.8 (1.8) min < med < max: 14.5 < 17.7 < 22.9 IQR (CV) : 2 (0.1) |
30 distinct values | 0 (0%) |
|
| 8 | vs [numeric] |
Min : 0 Mean : 0.4 Max : 1 |
0 : 18 (56.2%) 1 : 14 (43.8%) |
0 (0%) |
|
| 9 | am [numeric] |
Min : 0 Mean : 0.4 Max : 1 |
0 : 19 (59.4%) 1 : 13 (40.6%) |
0 (0%) |
|
| 10 | gear [numeric] |
Mean (sd) : 3.7 (0.7) min < med < max: 3 < 4 < 5 IQR (CV) : 1 (0.2) |
3 : 15 (46.9%) 4 : 12 (37.5%) 5 : 5 (15.6%) |
0 (0%) |
|
| 11 | carb [numeric] |
Mean (sd) : 2.8 (1.6) min < med < max: 1 < 2 < 8 IQR (CV) : 2 (0.6) |
1 : 7 (21.9%) 2 : 10 (31.2%) 3 : 3 ( 9.4%) 4 : 10 (31.2%) 6 : 1 ( 3.1%) 8 : 1 ( 3.1%) |
0 (0%) |
Static Plots
ggplot2 is our best friend in R visualization and it has good support in R markdown. Chaining functions using %>% and + makes the code chunk beautiful!.
A lot of times, we would like combined many sub-plots into one. ggplot2::facet_grid could do some of jobs, but I found ggpubr::ggarrange is more powerful that allow you to combined any plots and even tables. It’s cool to put chart and table side by side. (example is given in subsequent section)
ggrigdes is another useful ggplot extension that plots multiple density plots in a single chart. This is often used when comparing profiles between groups. check the detail from here: https://cran.r-project.org/web/packages/ggridges/vignettes/introduction.html
ggplot
require(ggplot2)
cor(mtcars) %>%
as.data.frame() %>%
tibble::rownames_to_column('var1') %>%
tidyr::pivot_longer(-var1, names_to = 'var2', values_to = 'cor') %>%
filter(var1 <= var2) %>%
ggplot(aes(x = var1, y = var2, fill = cor, label = round(cor,2))) +
geom_tile() +
geom_text() +
scale_fill_gradient2() +
labs(title = 'example of ggplot2 in R markdown')ggpubr
combine mulitple charts or tables
require(ggpubr)
require(forcats)
# add .groups = 'drop' to remove some warnings from `dplyr`
data <- mtcars %>%
group_by(gear) %>%
summarise(n = n(), .groups = 'drop') %>%
ungroup() %>%
mutate(gear = fct_rev(factor(gear)))
plt <- data %>%
ggplot(aes(x = gear, y = n)) +
geom_bar(stat = 'identity', fill = 'lightblue') +
coord_flip() +
labs(title = 'example of combined table and plot using ggpubr')
tbl <- ggtexttable(data, rows = NULL)
ggarrange(plt, tbl, ncol =2 , nrow = 1, widths = c(2,1))Interactive Plots
This is the section that becomes tricky. Interaction plots are only supported in HTML R document and there is no dominating interactive visualization packages in R environment.
plotlyprovides comprehensive chart types, documentation and cross-language capability. However I personally don’t like the style, syntax and toolbox at the right upper corner.echarts4ris a R interface forEchartsJavaScript library, which was open sourced by Baidu. I have tested thelatest version 0.3.2and it works well with R markdown.googleVisis a R interface forGoogle ChartsJavaScript library, which was of course developed by Google. The package has a good collection of different chart types, but it has some unknown incompatibility with both R markdown and Shiny. I’ve found a workaround to integrategoogleVischarts in R markdown, but it’s not perfect.
plotly
Plotly R document site: https://plotly.com/r/
Echarts
- echarts4r github: https://echarts4r.john-coene.com/
- Echarts site: https://echarts.apache.org/en/index.html
Google Charts
- googleVis: https://github.com/mages/googleVis
- Google Charts: https://developers.google.com/chart
self_contained: false is required for googleVis charts render in R markdown, refer to the github issue here.
output:
html_document:
self_contained: false
# self_contained: false is required for googleVis charts render in R markdown
suppressPackageStartupMessages(library(googleVis))
op <- options(gvis.plot.tag="chart")
plot(gvisHistogram(dino))